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Abstract — We  propose  decentralized  cognitive  MAC  protocols 
that  allow  secondary  users  to  independently  search  for  spectrum 
opportunities  without  a  central  coordinator  or  a  dedicated 
communication  channel.  Recognizing  hardware  and  energy  con¬ 
straints,  we  assume  that  a  secondary  user  may  not  be  able  to 
perform  full-spectrum  sensing  or  may  not  be  willing  to  monitor 
the  spectrum  when  it  has  no  data  to  transmit.  We  develop  an 
analytical  framework  for  opportunistic  spectrum  access  based 
on  the  theory  of  Partially  Observable  Markov  Decision  Process 
(POMDP).  This  decision- theoretic  approach  integrates  the  design 
of  spectrum  access  protocols  at  the  MAC  layer  with  spectrum 
sensing  at  the  physical  layer  and  traffic  statistics  determined 
by  the  application  layer  of  the  primary  network.  It  also  allows 
easy  incorporation  of  spectrum  sensing  error  and  constraint  on 
the  probability  of  colliding  with  the  primary  users.  Under  this 
POMDP  framework,  we  propose  cognitive  MAC  protocols  that 
optimize  the  performance  of  secondary  users  while  limiting  the 
interference  perceived  by  primary  users.  A  suboptimal  strategy 
with  reduced  complexity  yet  comparable  performance  is  devel¬ 
oped.  Without  additional  control  message  exchange  between  the 
secondary  transmitter  and  receiver,  the  proposed  decentralized 
protocols  ensure  synchronous  hopping  in  the  spectrum  between 
the  transmitter  and  the  receiver  in  the  presence  of  collisions  and 
spectrum  sensing  errors. 

Index  Terms —  Opportunistic  spectrum  access.  Cognitive  MAC. 
Partially  observable  Markov  decision  process  (POMDP). 


I.  Introduction 
A.  Opportunistic  Spectrum  Access 

THE  EXPONENTIAL  growth  in  wireless  services  has 
resulted  in  an  overly  crowded  spectrum.  The  current 
state  of  spectrum  allocation  indicates  that  almost  all  usable 
frequencies  have  already  been  occupied.  This  makes  one  pes¬ 
simistic  about  the  feasibility  of  integrating  emerging  wireless 
services  such  as  large-scale  sensor  networks  into  the  existing 
communication  infrastructure. 
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In  contrast  to  the  apparent  spectrum  scarcity  is  the  pervasive 
existence  of  spectrum  opportunity.  Extensive  measurements 
indicate  that,  at  any  given  time  and  location,  a  large  portion 
of  licensed  spectrum  lies  unused  [1].  Even  when  a  channel  is 
actively  used,  the  bursty  arrivals  of  many  applications  result 
in  abundant  spectrum  opportunities  at  the  slot  level.  These 
observations  form  the  key  rationale  for  opportunistic  spectrum 
access  (OS A)  envisioned  by  the  DARPA  XG  program  [2]. 
The  idea  is  to  exploit  instantaneous  spectrum  availability  by 
opening  licensed  spectrum  to  secondary  users  (for  example, 
sensor  networks).  This  would  allow  secondary  users  to  identify 
available  spectrum  resources  and  communicate  in  a  manner 
that  limits  the  level  of  interference  perceived  by  the  primary 
users.  Even  for  the  unlicensed  spectrum,  OSA  may  be  of  con¬ 
siderable  value  in  improving  spectrum  efficiency  by  supporting 
both  subscribers  and  opportunistic  users. 

While  conceptually  simple,  OSA  presents  challenges  not 
present  in  the  conventional  wired  or  wireless  networks.  We 
will  focus  in  this  paper  on  two  fundamental  issues  in  ad 
hoc  OSA  networks  where  there  is  no  central  coordinator  or 
dedicated  communication/control  channel. 

The  first  issue  deals  with  sensing  and  access  strategies 
that  integrate  opportunity  identification  and  exploitation.  We 
do  not  assume  that  each  secondary  user  has  full  knowledge 
of  the  availability  of  all  channels;  such  knowledge  implies 
continuous  full-spectrum  sensing  synchronous  among  sec¬ 
ondary  users.  While  simplifying  the  design  of  OSA  networks, 
continuous  full-spectrum  sensing  is  energy  inefficient  and 
hardware  demanding,  especially  for  low-cost  battery-powered 
wireless  nodes  with  bursty  traffic.  We  assume  instead  that 
each  secondary  user  can  choose  to  sense  a  subset  of  the 
possible  channels  (only  when  it  has  data  to  transmit)  and 
must  decide  whether  transmission  is  possible  based  on  the 
sensing  outcome.  When  only  part  of  the  spectrum  can  be 
sensed  at  a  particular  time,  sensing  and  access  need  to  be 
considered  jointly.  This  joint  design  also  allows  the  handling 
of  spectrum  sensing  errors  at  both  physical  and  MAC  layers  so 
that  interference  to  primary  users  is  limited  below  a  prescribed 
level. 

The  second  issue  is  transmitter-receiver  synchronization, 
which  is  unique  to  the  medium  access  control  (MAC)  in 
OSA  networks.  When  a  secondary  user  hops  in  the  spec¬ 
trum,  seeking  opportunities  that  are  time-varying  and  location- 
dependent,  its  intended  receiver  needs  to  hop  synchronously 
in  order  to  carry  out  the  communication.  In  an  ad  hoc 
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Fig.  1.  The  underlying  Markov  process  for  N  =  2. 

OSA  network  with  collisions  and  spectrum  sensing  errors, 
maintaining  transceiver  synchronization  without  introducing 
extra  control  message  exchange  is  nontrivial. 

B.  Summary  of  Results  and  Related  Work 

Summary  of  Results  We  present  in  this  paper  a  cross¬ 
layer  approach  to  OSA  that  integrates  spectrum  sensing  with 
spectrum  access.  We  adopt  a  decision-theoretic  approach  by 
casting  the  design  of  OSA  in  the  framework  of  Partially 
Observable  Markov  Decision  Process  (POMDP).  This  for¬ 
mulation  leads  to  optimal  policies  for  spectrum  sensing  and 
access  and  a  systematic  tradeoff  between  performance  and 
complexity.  Although  the  POMDP  formulation  may  appear  to 
be  natural,  specifics  of  OSA  ad  hoc  networks  lead  to  a  number 
of  nontrivial  theoretical  and  practical  issues.  For  example, 
although  the  imperfect  performance  of  spectrum  sensors  has 
been  investigated  [3],  sensing  errors  have  not  been  integrated 
into  cognitive  MAC  design.  To  our  best  knowledge,  the 
problem  of  synchronizing  opportunistic  users  in  the  presence 
of  collisions  and  sensing  errors  has  received  little  attention. 

The  solution  to  optimal  POMDP  has  exponential  complexity 
with  respect  to  the  number  of  channels.  If  the  number  of 
channels  available  for  secondary  users  is  relatively  small,  the 
optimal  policy  can  be  obtained  offline,  and  the  implemen¬ 
tation  cost  is  acceptable.  When  the  number  of  channels  is 
large,  searching  for  optimal  policy  becomes  impractical.  By 
exploiting  the  statistical  independencies  of  primary  users,  we 
obtain  a  sufficient  statistic  whose  dimension  grows  linearly 
instead  of  exponentially  with  the  number  of  channels.  Based 
on  this  reduced-dimension  sufficient  statistics,  we  propose 
a  suboptimal  greedy  approach  with  little  complexity  yet 
comparable  performance.  We  recognize  that  sensing  in  the 
presence  of  noise  and  fading  will  lead  to  errors,  which  cause 
opportunity  overlook  and  misidentification.  The  latter  leads 
to  collisions  with  the  primary  user,  which  must  be  capped 
below  a  predetermined  design  specification.  We  incorporate 
the  Receiver  Operating  Characteristics  (ROC)  of  the  spectrum 
sensor  into  the  design  of  OSA  MAC  protocols. 

Related  Work  The  underutilization  of  spectrum  under  the 
current  static  spectrum  management  policy  has  stimulated  a 
flurry  of  exciting  research  activities  in  searching  for  dynamic 
spectrum  access  strategies.  A  taxonomy  of  dynamic  spectrum 
access  can  be  found  in  [4]. 


OSA  is  one  of  the  several  approaches  to  dynamic  spectrum 
access.  Differing  from  this  work  that  mainly  addresses  the 
exploitation  of  temporal  spectrum  opportunities  resulting  from 
the  bursty  traffic  of  primary  users,  a  majority  of  existing 
work  focuses  on  spatial  spectrum  opportunities  that  are  static 
or  slowly  varying  in  time.  Example  applications  include  the 
reuse  of  certain  TV-bands  that  are  not  used  for  TV  broadcast 
in  a  particular  region.  Due  to  the  slow  temporal  variation 
of  spectrum  occupancy,  real-time  opportunity  identification 
is  not  as  critical  a  component  in  this  class  of  applications, 
and  the  prevailing  approach  to  OSA  tackles  network  design 
in  two  separate  steps:  (i)  opportunity  identification  assuming 
continuous  full-spectrum  sensing;  (ii)  opportunity  allocation 
among  secondary  users  assuming  full  knowledge  of  spectrum 
opportunities.  Opportunity  identification  in  the  presence  of 
fading  and  noise  uncertainty  has  been  studied  in  [3],  [5]- 
[8].  Spatial  opportunity  allocation  among  secondary  users  can 
be  found  in  [9]— [  12]  and  references  therein.  For  an  overview 
of  challenges  and  recent  development  in  OSA,  readers  are 
referred  to  [4]. 

II.  The  Network  and  Protocol  Model 
The  Network  Model  Consider  a  spectrum  consisting  of 
N  channels1,  each  with  bandwidth  Bi  (i  =  l,---  ,7V). 
These  TV  channels  are  licensed  to  a  primary  network  whose 
users  communicate  according  to  a  synchronous  slot  structure. 
The  traffic  statistics  of  the  primary  network  are  such  that 
the  occupancy  of  these  TV  channels  follows  a  discrete-time 
Markov  process  with  M  =2N  states.  Specifically,  the  network 
state  in  slot  t  is  given  by  [Si  (£),■••  ,  Sjv(f)]  where  S)(f)  € 
{0  (occupied)  ,1  (idle)  }.  The  state  diagram  and  a  sample 
path  of  the  state  evolution  for  TV  =  2  are  illustrated  in  Figure  1 
and  Figure  2,  respectively.  We  assume  that  the  spectrum  usage 
statistics  of  the  primary  network  remain  unchanged  for  T  slots. 

We  consider  a  secondary  network  seeking  spectrum  oppor¬ 
tunities  in  these  TV  channels  (see  Figure  2).  We  focus  on  an  ad 
hoc  network  where  secondary  users  join/exit  the  network  and 
sense/access  the  spectrum  independently  without  exchanging 
local  information.  In  each  slot,  a  secondary  user  chooses  a  set 
of  channels  to  sense  and  a  set  of  channels  to  access.  Fimited 
by  its  hardware  constraints  and  energy  supply,  a  secondary 
user  can  sense  no  more  than  L\  (L\  <  TV)  and  access  no 
more  than  L2  ( L2  <  Lf)  channels  in  each  slot. 

Our  goal  is  to  develop  cognitive  MAC  protocols  for  the 
secondary  network.  For  an  ad  hoc  OSA  network  without  a 
central  coordinator  or  a  dedicated  communication  channel, 
it  is  desirable  to  have  a  decentralized  MAC  protocol  where 
each  secondary  user  independently  searches  for  spectrum 
opportunities,  aiming  at  optimizing  its  own  performance.  Such 
decentralized  protocols  do  not  rely  on  cooperation  among 
secondary  users. 

The  Basic  Protocol  Structure  Without  delving  into  protocol 
details  (which  are  given  in  subsequent  sections),  we  present 
here  the  basic  protocol  structure.  At  the  beginning  of  each 
slot2,  a  secondary  user  with  data  to  transmit  chooses  a  set  of 

1  Here  we  use  the  term  channel  broadly.  A  channel  can  be  a  frequency  band 
with  certain  bandwidth,  a  collection  of  spreading  codes  in  a  CDMA  network, 
or  a  set  of  tones  in  an  OFDM  system. 

2The  slot  information  can  be  broadcasted  by  the  primary  network. 
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Fig.  2.  A  sample  path  of  spectrum  occupancy. 
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S  Data  Transmission 
I  I  Acknowledgement 

Fig.  3.  The  slot  structure. 


channels  to  sense  and  a  set  of  channels  to  access  based  on  the 
sensing  outcome.  Such  spectrum  sensing  and  access  decisions 
are  made  to  maximize  the  throughput  of  the  secondary  user 
while  limiting  the  interference  to  the  primary  network  by  fully 
exploiting  the  sensing  history  and  the  spectrum  occupancy 
statistics.  When  the  secondary  user  decides  to  transmit,  it 
generates  a  random  backoff  time,  and  transmits  when  this 
timer  expires  and  no  other  secondary  user  has  already  accessed 
the  channel  during  the  backoff  time.  At  the  end  of  the  slot, 
the  receiver  acknowledges  a  successful  data  transmission.  The 
basic  slot  structure  is  illustrated  in  Figure  3. 

III.  A  Decision-Theoretic  Approach  Based  on 
POMDP 

In  this  section,  we  develop  a  decision-theoretic  approach 
to  MAC  design  in  OSA  networks.  We  show  that  the  OSA 
network  specified  in  Section  II  can  be  modelled  by  a  POMDP 
and  the  spectrum  sensing  and  access  component  of  a  MAC 
protocol  corresponds  to  a  policy  for  this  POMDP.  Existing 
techniques  and  results  for  POMDP  can  then  be  used  to  develop 
MAC  protocols  for  OSA  networks. 

A  POMDP  Formulation  We  illustrate  in  Figure  4  the  Marko¬ 
vian  dynamics  of  the  OSA  network  specified  in  Section  II.  At 
the  beginning  of  each  slot,  a  secondary  user  chooses  a  set 
Ai  (|_4|  <  L\ )  of  channels  to  sense.  Given  that  the  current 
state  of  the  underlying  Markov  process  is  j,  the  user  observes 
{0, 1 }  1  which  indicates  the  availability  of  each 

sensed  channel.  Based  on  this  observation,  the  user  chooses 
a  set  A2  C  Ai  (\A2 1  <  L2)  of  channels  to  access.  For  the 
chosen  action,  the  user  receives  a  reward  rj^i  a2  at  the  end  of 
this  slot.  The  sequence  of  operations  in  each  slot  is  illustrated 
in  Figure  5. 

The  objective  is  to  choose  the  sensing  and  access  action 
{ Ai,A2 }  sequentially  in  each  slot  so  that  the  total  expected 
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Fig.  5.  The  sequence  of  operations  in  a  slot. 


reward  accumulated  over  T  slots  (wherein  the  spectrum  oc¬ 
cupancy  statistics  remain  unchanged)  is  maximized.  We  now 
have  a  POMDP  since,  in  general,  the  network  state  cannot 
be  fully  observed  due  to  partial  spectrum  monitoring  (|.4i|  < 
L\  <  N)  or  sensing  error. 

We  assume  in  this  paper  that  the  state  transition  probabilities 
{pi,j}  are  known.  In  practice,  this  may  not  be  available. 
The  problem  then  becomes  one  of  POMDP  with  unknown 
transition  probabilities.  Such  formulations  are  beyond  the 
scope  of  this  paper.  Algorithms  for  POMDP  with  an  unknown 
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model  exist  in  the  literature  [13]  and  are  applicable  to  the  OS  A 
problem. 

A  Sufficient  Statistic  For  a  POMDP,  the  internal  state  of  the 
underlying  Markov  process  is  unknown.  At  the  beginning  of 
slot  t,  our  knowledge  of  the  internal  state  of  the  network  based 
on  all  past  decisions  and  observations  can  be  summarized  by 
a  belief  vector  A(t)  =  [Ai(f),  •  •  •  ,  A m(£)]  where  X3  (t)  is  the 
conditional  probability  (given  the  decision  and  observation 
history)  that  the  network  state  is  j  at  the  beginning  of  slot 
t  prior  to  the  state  transition  (see  Figure  5). 

It  has  been  shown  in  [14]  that  for  any  t,  the  belief  vector 
A(f)  is  a  sufficient  statistic  for  the  design  of  the  optimal  action 
{Ai,  A2}  in  slot  t.  A  policy  n  for  a  POMDP  is  thus  given  by 
a  sequence  of  functions,  each  mapping  from  the  current  belief 
vector  A(f)  to  the  sensing  and  access  action  {Ai(t) ,  A2(t)} 
to  be  taken  in  slot  t,  i.e.,  as  in  (1). 

Under  this  formulation,  a  spectrum  sensing  and  access 
strategy  is  essentially  a  policy  of  this  POMDP  over  a  finite 
horizon3. 

Reward  and  Objective  Function  The  reward  gained  by  a 
secondary  user  in  each  slot  can  be  defined  in  many  ways 
depending  on  the  design  objective.  For  an  OSA  network,  the 
most  obvious  way  is  to  defined  the  reward  _42  (f )  as  the 
number  of  bits  delivered  when  the  user  senses  channels  in  Ai 
and  transmits  using  channels  in  A2  given  that  the  network  is 
in  state  j.  Assume  that  the  number  of  bits  delivered  over  a 
channel  in  one  slot  is  proportional  to  its  bandwidth.  We  define 
the  reward  as4 

(2) 

i&A2 

where  St (t)  £  {0, 1}  is  the  state  of  channel  i  in  slot  t.  Note 
that  in  the  presence  of  sensing  error,  a  secondary  user  may 
access  an  unavailable  channel,  resulting  in  a  collision  with  a 
primary  user. 

Let  (  denote  the  maximum  probability  of  collision  allowed 
by  the  primary  network.  The  design  objective  for  cognitive 
MAC  is  to  maximize  the  expected  total  number  of  bits 
transmitted  in  T  slots  under  the  constraint  that  the  probability 
of  collision  is  bounded  below  £,  i.e.,  the  optimal  policy  7r*  is 
given  by  (3)  where  represents  the  conditional  expectation 
given  that  policy  7r  is  employed,  Pc  is  the  probability  of 
collision,  and  A(l)  the  initial  belief  vector  which  can  be 
the  stationary  distribution  of  the  network  state.  Note  that  the 
probability  of  collision  Pc  depends  on  the  sensing  and  access 
policy  tv  as  well  as  the  operating  characteristic  of  the  spectrum 
sensor. 

For  ease  of  presentation,  we  assume  in  the  rest  of  the  paper 
that  L\  =  L2  =  1.  In  this  case,  the  action  taken  in  each 
slot  consists  of  the  index  a  £  {1,  •  ■  •  ,  N}  of  the  channel  to 

'The  optimal  sensing  and  access  strategy  can  also  be  formulated  as  a 
POMDP  over  an  infinite  horizon.  Since  the  statistics  of  channel  occupancy 
vary  with  time  due  to  changes  in  traffic  load,  it  is  more  appropriate  to  consider 
the  finite  horizon  formulation.  It  is,  however,  straightforward  to  extend  our 
formulation  to  an  infinite  horizon  setup  in  which  stationary  policies  can  be 
obtained. 

4In  this  paper,  we  focus  on  decentralized  cognitive  MAC  where  secondary 
users  make  independent  and  selfish  decisions  without  coordination.  In  this 
case,  a  secondary  user  chooses  its  spectrum  sensing  and  access  strategies 
under  the  assumption  that  it  will  receive  a  reward  when  the  chosen  channel 
is  not  used  by  the  primary  network. 


be  sensed  and  the  decision  <f>a  £  {0  (no  access)  ,  1  (access)} 
on  whether  to  transmit.  Results  obtained  in  this  paper  can  be 
readily  extended  to  general  cases. 

IV.  Spectrum  Sensing  and  Access:  Optimal  and 
Suboptimal  Strategies 

In  this  section,  we  propose  spectrum  sensing  and  ac¬ 
cess  strategies  under  the  POMDP  framework  developed  in 
Section  III.  We  first  assume  error-free  spectrum  sensing  to 
illustrate  the  structure  of  the  optimal  and  suboptimal  strategies. 
We  then  address  spectrum  sensing  and  access  in  the  presence 
of  sensing  error. 


A.  An  Optimal  Channel  Sensing  and  Access  Strategy 

When  the  sensing  outcome  reflects  the  true  channel  state, 
the  access  decision  is  straightforward:  transmit  if  and  only 
if  the  channel  is  sensed  to  be  available.  The  constraint  on 
the  probability  of  collision  in  (3)  becomes  irrelevant.  The 
design  objective  is  to  determine,  in  each  slot,  which  channel 
to  sense  so  that  the  expected  total  reward  obtained  in  T  slots 
is  maximized. 

Referred  to  as  the  value  function,  V}(A(t))  denotes  the 
maximum  expected  remaining  reward  that  can  be  accrued 
starting  from  slot  t  when  the  current  belief  vector  is  A (£). 
It  has  two  parts:  (i)  the  immediate  reward  obtained  in  slot  t 
which  is  given  by  0  3  „  Z?„  when  the  network  is  at  state  j  and 
the  user  senses  channel  a  and  observes  Qja  £  {0, 1};  (ii)  the 
maximum  expected  remaining  reward  Vt+i(A(f  +  1))  starting 
from  slot  t+ 1  given  a  belief  vector  A(f+1)  =  T(A(f)|a,  0y,a) 
which  represents  the  updated  knowledge  of  the  network  state 
after  incorporating  the  action  and  observation  obtained  in  slot 
t.  Averaging  over  all  possible  network  states  and  observations, 
we  arrive  at  the  following  Bellman’s  equation  (4)  where  the 
updated  belief  vector  A(f+1)  =  T(A(£)|a,  0y,a)  can  be  easily 
obtained  via  the  Bayes  rule. 


A(f  +  1)  = 


T(A(f)|a,  9)  —  [Ai(f  +  1),  •  •  •  ,  Am(£  +  1)]> 

Ef=i  PiIQjAi  =o\  (5 

E£iE£i  Ai(iKi  Pr[0^1=0f 


From  (4)  we  can  see  that  an  action  chosen  at  a  slot  affects 
the  total  reward  in  two  ways:  it  acquires  an  immediate  reward 
9Ba  in  this  slot  and  transforms  the  belief  vector  to  T(A|a,  9) 
which  determines  the  future  reward  Ut+i(T(A(£)|a,  0)).  The 
optimal  policy  strikes  a  balance  between  gaining  instantaneous 
reward  and  gaining  information  for  future  use. 

Smallwood  and  Sondik  showed  in  [14]  that  V}(A(f))  is  con¬ 
vex  and  piecewise  linear  as  illustrated  in  Figure  6.  Specifically, 
the  domain  of  Vt(A(f))  can  be  partitioned  into  a  finite  number 
of  convex  regions  {Ci (£),•••  ,Cl(£)}-  Associated  with  each 
region  Cff)  is  a  vector  T ft)  such  that  the  value  function 
Vt(A(t))  in  this  region  is  given  by  the  inner  product  of  A (£) 
(A(t)  £  Ci(t))  and  T ,(£).  Applying  this  structure  of  the 
value  function  to  (4),  we  obtain  (6)  where  <  v  >  denotes 
inner  product  and  *A(t+i)  the  index  of  the  region  containing 
the  updated  belief  vector  A (t  +  1)  =  T(A(t)\a,9).  Thus,  if 
the  convex  regions  {Ci(t  +  1)}  and  the  associated  T-vectors 
{Tj(f+ 1)}  have  been  calculated  for  slot  t  + 1,  we  can  obtain 
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7T  =  [m,--  •  ,/UT],  where  pt  :  A (t)  €  [0, 1]M  ->  {Ai(t) ,  A2{t)} 


(1) 


T 

7 r*  =  argmaxEw[^ri(t)^l(t)^2(t)  |  A(l)],  subject  to  Pc  <  C,  (3) 

t—1 


MM  1 


ai?aXtV^  A*  zL  P^J  J2  Pl'[0i>«  = 
’  ’  1= 1  j=l  0=o 


0](0Ba  +  Vt+i(T(A(i)|a,0)))} 


(4) 


Fig.  6.  The  structure  of  Vt( A).  We  consider  a  two-state  system  ( M  =  2).  A  belief  vector  A  =  [A,  1  —  A]  can  be  represented  by  a  point  in  [0, 1].  As  shown 
above,  after  we  observe  6  in  slot  t  —  1,  the  belief  vector  is  transformed  into  a  (different)  point  in  the  space  of  belief  vectors  for  the  succeeding  slot  (see  (5)). 
For  the  example  above,  the  space  of  belief  vectors  for  slot  t  is  partitioned  into  four  regions  and  within  each  region,  Vj(A(t))  is  a  linear  function  of  A(t). 
The  corresponding  T-vectors  (the  slope  of  Vt(A(t))  in  this  example)  are  denoted  by  ,  X4(f)}. 


from  (6)  the  optimal  actions  and  the  corresponding  T-vectors 
for  slot  t.  A  linear  programming  algorithm  is  provided  in 
[14]  to  carry  out  this  computation.  The  optimal  policy  is  thus 
given  by  the  convex  regions  and  the  associated  T -vectors  and 
optimal  actions  for  t  =  1,  •  •  •  ,T.  Note  that  the  computation 
of  the  optimal  policy  can  be  obtained  off-line  and  the  result 
stored  in  a  table. 

B.  A  Reduced-State  Suboptimal  Strategy 

Finding  the  optimal  policy  for  a  general  POMDP  can  be 
computationally  prohibitive.  One  reason  is  that  the  dimension 
of  the  sufficient  statistic  A  grows  exponentially  with  the 
number  N  of  channels.  Although  the  optimal  policy  can  be 
computed  off-line  and  stored  before  a  secondary  user  starts  to 
access  the  spectrum,  this  approach  makes  it  difficult  to  adapt  to 
changes  in  the  spectrum  occupancy  statistics.  It  is  thus  crucial 
to  exploit  the  specific  structure  of  the  problem  and  develop 
suboptimal  strategies  with  reduced  complexity.  In  this  section, 
we  show  that  when  channels  evolve  independently,  we  can  find 
a  sufficient  statistic  for  the  optimal  policy  whose  dimension 
grows  linearly  with  N. 

Proposition  1:  Let  Cl  =  [u; i,---  , cujv]  where  u>i  is  the 
probability  (conditioned  on  the  sensing  and  decision  history) 
that  channel  i  is  available  at  the  beginning  of  a  slot.  Then  Cl 
is  a  sufficient  statistic  for  the  optimal  OSA  protocol  under  N 
independent  channels. 

Proof:  See  Appendix. 

Proposition  1  shows  that  by  exploiting  the  statistical  inde¬ 
pendency  among  channels,  we  can  reduce  the  dimension  of 
the  sufficient  statistic  from  2N  to  N.  This  result  points  to 


Fig.  7.  The  Markov  channel  model 

the  possibility  of  significantly  reducing  the  computational  and 
storage  complexity  of  the  optimal  OSA  protocol. 

Based  on  the  sufficient  statistic  Cl,  we  propose  a  suboptimal 
protocol  based  on  a  greedy  approach  that  maximizes  per- 
slot  throughput5.  Assume  that  channels  evolve  independently. 
As  illustrated  in  Figure  7,  channel  i  transits  from  state  0 
(unavailable)  to  state  1  (available)  with  probability  a,  and 
stays  in  state  1  with  probability  /3j.  Given  that  our  knowledge 
of  the  network  state  is  Cl(t)  at  the  beginning  of  slot  t,  prior  to 
the  state  transition,  the  expected  reward  to  be  gained  in  slot  t 
if  channel  a  is  selected  is 

( ua(t)pa  +  (1  -  ua(t))aa)Ba,  (7) 

where  ( u>af3a  +  (1  —  u>a)aa)  is  the  probability  that  channel 
a  will  be  available  in  slot  t.  For  the  greedy  approach,  the 
action  in  slot  t  is  chosen  to  maximize  the  expected  immediate 
reward,  i.e.,  the  index  a,,  (t)  of  the  chosen  channel  is  given  by 

a*(t)  =  arg  max  (wn(£)/ 3a  +  (1  -  uja(t))aa)Ba.  (8) 

a=l,---  ,N 

5  When  channels  are  correlated,  we  can  similarly  develop  a  greedy  approach 
based  on  A. 
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MM  1 

Vt{A{t))  =  a^xN{J2XiJ2Pi’^  Hpr[0i>“  =  e)(dBa+  <  A(t+l),Ti ;A(t+1)(f +  1)  >)} 

i—1  j= 1  6=0 


(6) 


n(t  +  i) 

U)i(t  +  1) 


[oJi(t  +  1),-  •  •  ,0JN(t  +  1)]  =  T(fi(t)]a*(£),0a,,(£)), 
f  1  if  a*(t)  =  i,Oa,(t)  =  1 

<  0  if  a*(f)  =  i,&a(t)  =  0 

[  uJi(t)f3i  +  (1  -  u>i(t))ai  if  a*(t)^i 


(9) 


Fig.  8.  ROC  curves  for  detecting  Gaussian  signal  in  Gaussian  noise. 


At  the  end  of  slot  t,  the  belief  vector  V.  is  updated  based  on 
the  action  a*(£)  and  the  observation  0 a,(t)  (indicating  the 
availability  of  channel  a*)  as  in  (9).  Note  that  when  a  channel 
is  not  sensed,  the  probability  of  its  availability  is  updated 
according  to  the  Markov  chain.  If  the  channel  is  sensed,  the 
state  of  this  channel  is  the  sensing  outcome  since  the  belief 
vector  records  the  channel  state  prior  to  the  state  transition  at 
the  beginning  of  each  slot. 

Let  Wt(U)  denote  the  expected  remaining  reward  starting 
from  slot  t  achieved  by  the  greedy  approach.  We  obtain 
a  recursive  equation  for  W*(f2)  as  in  (10),  where  a*  and 
T(fi\a,6)  are  given  by  (8)  and  (9),  respectively. 

The  computational  complexity  of  the  greedy  approach  and 
a  systematic  way  of  trading  off  performance  with  complexity 
are  studied  in  the  context  of  energy-constrained  OS  A  in  [15]. 


C.  Spectrum  Sensing  and  Access  in  the  Presence  of  Sensing 
Error 

We  now  consider  the  scenario  where  sensing  errors  cannot 
be  ignored.  In  this  case,  not  only  the  sensing  and  access 
strategy  but  also  the  operating  characteristics  of  the  spectrum 
sensor  affect  the  performance  of  the  OSA  network  and  the 
interference  perceived  by  the  primary  network.  The  problem 
thus  includes  the  design  of  the  spectrum  sensor  as  well  as  the 
sensing  and  access  strategy  for  optimal  spectrum  utilization 
under  a  constraint  on  the  maximum  collision  probability. 


Spectrum  sensors  perform  a  binary  hypotheses  test:  Hq  (null 
hypothesis  indicating  that  the  sensed  channel  is  available)  vs. 
Hi  (alternative).  If  the  sensor  of  a  secondary  user  mistakes 
Ho  for  Hi  (false  alarm),  the  secondary  user  may  refrain  from 
transmitting,  and  a  spectrum  opportunity  is  overlooked.  On 
the  other  hand,  if  the  detector  mistakes  Hi  for  Ho  (miss 
detection),  a  misidentification  of  spectrum  opportunity  occurs; 
the  secondary  user  collides  with  a  primary  user  if  it  trusts 
the  sensing  outcome.  Let  e  and  6  denote,  respectively,  the 
overlook  (false  alarm)  and  misidentification  (miss  detection) 
probabilities.  The  performance  of  the  sensor  is  specified  by  the 
Receiver  Operating  Characteristics  (ROC)  curve  which  gives 
1  —  S  as  a  function  of  e  (examples  are  given  in  Figure  8).  The 
objective  is  to  design  the  optimal  spectrum  sensing  and  access 
policy  7r*  and  the  operating  point  5*  (on  the  ROC  curve)  of 
the  spectrum  sensor.  Specifically,  as  in  (11). 

The  above  optimization  is  a  constrained  POMDP  problem 
which  generally  requires  a  randomized  optimal  policy.  To 
obtain  a  deterministic  strategy  with  low  complexity,  we  aim  at 
separating  the  objective  function  of  (11)  from  the  constraint. 
Specifically,  we  choose  the  sensor  operating  point  based  on 
the  constraint  on  the  probability  of  collision:  <5*  =  £.  In  this 
case,  the  optimal  access  policy  is  given  by 


$ 


a 


1  if  0  a.  ~  1 

0  if  0Q  =  0 


(12) 


Since  the  probability  of  misidentification  of  the  spectrum 
sensor  is  £,  the  probability  of  colliding  with  a  primary  user 
under  this  access  strategy  is  £,  satisfying  the  design  constraint. 
The  problem  is  then  reduced  to  an  unconstrained  POMDP 
where  the  optimal  policy  for  channel  selection  is  obtained  to 
maximize  the  throughput  of  the  secondary  user. 

Both  the  optimal  and  suboptimal  greedy  approaches  pre¬ 
sented  in  Section  IV.A-B  can  be  extended  to  incorporate  sens¬ 
ing  error.  We  consider  here  the  greedy  strategy  that  maximizes 
the  per-slot  throughput.  Let  Ua  denote  the  number  of  bits  that 
can  be  successfully  delivered  if  channel  a  is  chosen.  The  index 
a*  of  the  channel  to  be  selected  is  then  given  by  (13). 

The  information  gained  at  the  transmitter  in  slot  t  includes 
the  decision  {a* ,  <j>a, }  and  the  observation  {0a„ ,  Kat }  where 
Ka,  £  {0,1}  indicates  whether  an  acknowledgement  is 
received  at  the  end  of  this  slot6.  The  information  gained  at  the 
receiver,  however,  includes  only  a*  and  Ka  since  the  receiver 
does  not  have  the  sensing  outcome  0a»  at  the  transmitter 
(due  to  sensing  error)  and  cannot  distinguish  an  unsuccessful 
transmission  from  the  no-access  decision  (ha,  =  0  of  the 


6The  transmission  of  acknowledgement  is  assumed  to  be  error-free. 
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Wt(  ft)  =  (wa.pa.  +  (l-uja,)aa,)Ba*  +^Pr[0a<  =  0|n,a*]Wt+i(T(n|a*,0)) 

0=0 

=  {w«.Pa.  +  (1  -  coa,)aa,)Ba .  +  [w„.(l  -/?„.)  +  (1  -  wa.)(l  -  aaJ]Wt+i(T(ft|a*,0)) 

+  [wo.j8a.  +  (1  -  w0.)a0.]Wt+1(T(fi|o.,  1))  (10) 


T 

{ tr* ,  (5* }  =  argmaxE^V  rmAl{t)A2{t)  |  A(l)],  subject  to  Pc  <  C 
{7r’*}  t=i 

(ID 

a*  =  arg  max  E[Ua  |ft]  =  arg  max  {£?„  Pr[Sa  =  1, 

a—  I,---  ,N  a= I,---  ,N 

©a  =  1|U]} 

=  arg  max  {Ba{l  -  e){ujaf3a  +  (1  -  uja)aa}} 

a=  I,---  ,N 

(13) 

ft(£  + 1)  =  T(Q(t)  |a*,  Ka .„)  =  [u)i  (t  +  1),  •  •  •  ,  uj]\r(t  +  1)] 
Wi(f+1)  =  Pr[Si(f)  =  l|D(f),a*,ATaJ 

if  Cl *  X 

=  1 

e(Wa„/3a„  +(1  — OJa»  )».,  ) 

if  a*  =  i,  Ka,  =  1 
if  a*  =  i,  Kati  =  0 

(14) 

\  e(uja,  0a,  +(1— )aa*  )+(u«,  (1  —  0a,  )  +  ( 1— ua,  ) (1  —  aa,  )) 

transmitter.  In  order  for  the  transmitter  and  the  receiver  to 
arrive  at  the  same  belief  vector  f l(t  +  1),  which  ensures  that 
they  tune  to  the  same  channel  in  the  next  slot  (see  (13)), 
the  belief  vector  should  be  updated  at  both  the  transmitter 
and  the  receiver  using  only  the  common  information  a*  and 
Ka*.  The  belief  vector  f l(t  +  1)  is  thus  given  by  (14), 
which  is  obtained  from  (15)  and  (16),  where  (15)  follows 
from  Pr[Sa„(t)  =  1,0Q„  =  1  ,Ka*  =  0|ft(£)]  =  0  (no 
acknowledgement  received  when  a  transmission  occurs  over 
an  available  channel7)  and  (16)  from  Br[Ka*  =  0|Sa„(f)  = 
0]  =  1  (no  acknowledgement  received  given  that  the  channel 
is  unavailable).  We  then  obtain  (14). 

The  above  specifies  the  spectrum  sensing  and  access  strat¬ 
egy  when  the  spectrum  sensor  operates  at  5  =  (  (see  Figure  8). 
Implementation  details  of  this  protocol  are  given  in  Figure  9. 
The  implementation  of  the  optimal  sensing  strategy  can  be 
similarly  obtained. 

The  optimal  joint  design  of  spectrum  sensor  and  sens- 
ing/access  strategy  given  in  (11)  is  studied  in  [16],  where 
a  separation  principle  is  established  that  leads  to  simple, 
deterministic,  yet  optimal  solutions. 

V.  Protocol  Specifics  of  Decentralized  Cognitive 

MAC 

In  this  section,  we  present  protocol  specifics  of  the  proposed 
cognitive  MAC  strategies.  Functions  (other  than  spectrum 
sensing  and  access)  of  cognitive  MAC  protocols  are  identified 

7We  assume  that  a  collision  with  a  primary  user  cannot  be  distinguished 
from  that  with  a  secondary  user  and  no  statistical  information  on  the 
occurrence  of  collisions  with  secondary  users  is  available.  The  belief  vector 
is  thus  updated  based  on  the  assumption  that  an  unsuccessful  transmission 
results  from  a  collision  with  a  primary  user. 


and  protocol  implementations  are  presented  by  considering 
two  network  settings  with  increasing  complexity. 

A.  OSA  Networks  with  Spatially  Invariant  Spectrum  Oppor¬ 
tunity 

We  consider  first  a  secondary  network  where  every  user  is 
affected  by  the  same  set  of  primary  users.  In  this  case,  the 
state  of  a  channel  is  the  same  at  both  the  transmitter  and 
the  receiver.  Detection  of  channel  state  can  thus  be  carried 
out  at  the  transmitter  alone.  The  main  issue  that  needs  to  be 
addressed  by  the  MAC  protocol  is  transceiver  synchronization: 
the  transmitter  and  the  receiver  need  to  tune  to  the  same 
channel  in  order  to  communicate,  and  they  need  to  hop 
synchronously.  The  synchronization  problem  can  be  separated 
into  two  phases:  the  initial  handshake  between  the  transmitter 
and  the  receiver  and  the  synchronous  hopping  in  the  spectrum 
after  the  initial  establishment  of  communication. 

There  are  a  number  of  standard  implementations  to  facilitate 
the  initial  handshake.  Here  we  borrow  the  idea  of  receiver- 
oriented  code  assignment  in  CDMA  ad  hoc  networks  [17]. 
Specifically,  each  secondary  user  is  assigned  a  set  of  channels 
(not  necessarily  unique)  which  it  monitors  regularly  to  check 
whether  it  is  an  intended  receiver.  A  user  with  a  message  for, 
say,  user  A  will  transmit  a  handshake  signal  over  one  of  the 
channels  assigned  to  user  A.  Once  the  initial  communication 
is  established,  the  transmitter  and  the  receiver  will  implement 
the  same  spectrum  sensing  and  access  strategy  which  governs 
channel  selection  in  each  slot.  We  show  below  that  the 
sensing  and  access  strategies  developed  in  Section  IV  ensure 
synchronous  hopping  between  the  transmitter  and  the  receiver 
in  the  presence  of  collisions  and  sensing  errors. 
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Pr[s„.(<) .  i|o(t),A',.  =  o]  =  rns-W  =  i.e.,  -oinwi  +  Prp W  =  i.e„,  =  i.a„.  =mm 

Pr[i\ajtc  —  0|rz(£)J 

=  _ Pr[eQ<  =  0| Sa.(t)  =  1]  Pr[S„.(t)  =  jjgffl] _ 

Pr  [Ka.  =  0|So.(i)  =  l]Pr[S0.(i)  =  l\Sl(t)]  +  Pr[Ka,  =  0|Sa,(i)  =  0]Pr[5„.(i)  =  0|f 2(f)] 

Pr[©a*  =  0|Sa.(t)  =  l]Pr[Sa.(i)  =  l|f2(f)] 

Pr[$Q,  =  0|  Sa,(t)  =  l]Pr[Sa.(t)  =  l|f2(f)]  +  Pr[5a,  (f)  =  0|  12(f)] 


05) 

(16) 


Decentralized  Cognitive  MAC 

At  the  beginning  of  slot  f  with  a  belief  vector  f2(f)  at  both  the  transmitter  and  the  receiver, 

1)  both  the  transmitter  and  the  receiver  chooses  channel  a*  given  by  (13); 

2)  the  transmitter  senses  channel  a*  and  obtains  the  sensing  outcome  Qat ; 

3)  the  transmitter  chooses  the  access  action  'I»ai  according  to  (12); 

4)  if  4)a>t  =  1,  the  transmitter  transmits  data  over  channel  a*  using  carrier  sensing; 

5)  if  a  data  packet  is  successfully  received,  the  receiver  transmits  acknowledgement  Ka t ; 

6)  both  the  transmitter  and  the  receiver  obtains  the  new  belief  vector  f 2(f  +  1)  using  {a*,Kat}  according  to 
(14). 

Fig.  9.  Protocol  description  for  OSA  networks  with  spatially  invariant  spectrum  opportunity. 


Proposition  2:  In  OSA  networks  with  spatially  invariant 
spectrum  opportunity,  the  proposed  cognitive  MAC  protocols 
ensure  transceiver  synchronization  in  the  presence  of  collisions 
and  spectrum  sensing  errors. 

Proof:  We  focus  on  the  suboptimal  greedy  approach  outlined 
in  Figure  9.  The  same  conclusion  can  be  drawn  for  the  optimal 
strategy.  It  is  easy  to  see  from  the  protocol  description  given 
in  Figure  9  that  the  transmitter  and  the  receiver  maintain  the 
same  belief  vector  independent  of  collision  and  sensing  error. 
Since  the  channel  selection  is  determined  by  the  belief  vector, 
transceiver  synchronization  is  maintained. 

B.  OSA  Networks  with  Spatially  Varying  Spectrum  Opportu¬ 
nity 

When  secondary  users  are  affected  by  different  sets  of 
primary  users,  the  state  of  spectrum  occupancy  is  location 
dependent;  a  channel  that  is  idle  at  a  transmitter  may  not 
be  idle  at  the  corresponding  receiver.  This  spatial  variation 
of  spectrum  occupancy  results  in  new  design  challenges  as 
specified  below. 


Tx 


Rx 


ACK 


SLOT 

Fig.  1 1 .  The  protocol  structure  for  OSA  networks  with  spatially  varying 
spectrum  opportunity. 


Spectrum  Opportunity  Identification  As  illustrated  in  Fig¬ 
ure  10,  the  state  of  a  channel  at  the  transmitter  A  is  determined 
by  the  transmission  activities  of  those  primary  users  within 
A’s  receiving  range  r  while  the  state  of  this  channel  at  the 
receiver  B  is  determined  by  primary  users  within  B’s  receiving 
range.  Since  a  channel  only  presents  an  opportunity  to  a  pair 
of  secondary  users  if  it  is  available  at  both  the  transmitter 
and  the  receiver,  spectrum  opportunities  need  to  be  identified 
jointly  by  the  transmitter  and  the  receiver8. 

We  propose  the  following  modification  to  the  basic  protocol 
structure  illustrated  in  Figure  3  to  address  the  issue  of  oppor¬ 
tunity  identification.  As  shown  in  Figure  11,  at  the  beginning 
of  a  slot,  the  transmitter  monitors  the  channel  for  a  period  of 
time  to  ensure  the  required  sensing  accuracy.  If  the  channel 
is  sensed  to  be  available,  the  transmitter  generates  a  random 
backoff  time.  If  the  channel  remains  idle  when  its  backoff  time 
expires,  it  transmits  a  short  request-to-send  (RTS)  message 
to  the  receiver,  indicating  that  the  channel  is  available  at  the 
transmitter.  The  receiver,  upon  receiving  the  RTS,  replies  with 
a  clear-to-send  (CTS)  message  if  the  channel  is  also  available 
at  the  receiver.  A  successful  exchange  of  RTS-CTS  completes 

sIn  this  case,  Si(t)  =  1  if  channel  i  is  available  at  both  the  transmitter 
and  the  receiver.  Otherwise,  Si(t)  =  0.  Strictly  speaking,  the  availability 
of  a  channel  at  the  secondary  transmitter  is  determined  by  primary  receivers 
rather  than  primary  transmitters  in  its  neighborhood  [4],  The  detection  of 
primary  receivers  can  be  transformed  to  the  detection  of  primary  transmitters. 
A  detailed  presentation  can  be  found  in  [4], 
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Fig.  12.  Performance  of  the  optimal  cognitive  MAC  protocol  under  different 
spectrum  occupancy  statistics  (N  =  3  independent  channels  with  the  same 
bandwidth  B  =  1  and  transition  probabilities  { a ,  /?}). 


the  identification  of  a  spectrum  opportunity. 

Hidden  and  Exposed  Terminals  The  presence  of  hidden  and 
exposed  terminals  is  a  classical  problem  in  MAC  design  for 
multihop  ad  hoc  networks.  In  an  OSA  network  as  illustrated 
in  Figure  10,  hidden  terminals  are  secondary  users  within  the 
secondary  receiver’s  range  but  outside  the  secondary  trans¬ 
mitter’s  range  (for  example,  D)  while  exposed  terminals  are 
secondary  users  within  the  secondary  transmitter’s  range  but 
outside  the  secondary  receiver’s  range  (for  example,  C).  Since 
hidden  terminals  can  cause  collision  and  exposed  terminals 
may  lead  to  wasted  opportunities9,  the  ability  to  deal  with 
hidden  and  exposed  terminals  is  crucial  to  the  efficiency  of 
cognitive  MAC  protocols. 

In  the  proposed  protocols,  the  RTS-CTS  exchange  has  dual 
functions.  Besides  facilitating  opportunity  identification,  it 
also  mitigates  the  hidden  and  exposed  terminal  problem  as  in 
a  conventional  communication  network  [18].  Other  collision 
avoidance  schemes  such  as  busy  tone  and  dual  busy  tone  may 
be  incorporated  to  further  reduce  the  occurrence  of  collision. 
Transceiver  Synchronization  The  issue  of  initial  handshake 
and  transceiver  synchronization  is  similar  to  that  in  the  first 
case.  The  protocol  implementation  specifics  can  be  similarly 
defined  as  in  Figure  9.  It  is  easy  to  show  that  in  OSA 
networks  with  spatially  varying  spectrum  opportunity,  the  pro¬ 
posed  protocol  maintains  the  same  update  of  the  belief  vector 
at  the  transmitter  and  the  receiver,  thus  ensures  transceiver 
synchronization  as  stated  in  the  proposition  below. 

Proposition  3:  In  OSA  networks  with  spatially  varying 
spectrum  opportunity,  the  proposed  cognitive  MAC  protocols 
ensure  transceiver  synchronization  in  the  presence  of  collisions 
and  spectrum  sensing  errors. 

9  If  C  chooses  to  sense  the  channel  used  by  A,  C  concludes  that  the 
channel  is  not  available  thus  refrains  from  transmitting.  This  leads  to  a  wasted 
opportunity  since  C’s  transmission  will  not  interfere  with  the  communication 
between  A  and  B. 


VI.  Numerical  and  Simulation  Results 

In  this  section,  we  present  numerical  and  simulation  results 
to  evaluate  the  performance  of  the  proposed  optimal  and 
subop timal  cognitive  MAC  protocols. 

Optimal  Performance  and  the  Impact  of  Primary  Network 
Traffic  Statistics  In  these  examples,  we  assume  sensing 
errors  can  be  ignored  and  focus  on  one  secondary  user.  Shown 
in  Figure  12  is  the  performance  of  the  optimal  protocol 
under  different  spectrum  occupancy  statistics.  We  consider 
three  independent  channels  with  the  same  bandwidth  B  =  1 
and  transition  probabilities  {ct,/3}  (see  Figure  7).  In  Case  1, 
the  channel  state  remains  unchanged  with  a  large  probability 
0.8.  This  corresponds  to  bursty  traffic  arrivals  in  the  primary 
network:  both  the  message  length  and  the  inter-arrival  time 
are  large.  Case  3  is  the  opposite  of  Case  1:  the  inter-arrival 
time  and  the  message  length  of  the  primary  network  are 
relatively  small,  resulting  in  more  frequent  changes  in  the 
channel  state.  In  Case  2,  the  channel  is  equally  likely  to  change 
the  state  or  remain  at  the  current  state.  Note  that  in  all  three 
cases,  the  channels  have  the  same  stationary  distribution:  with 
probability  0.5  a  channel  is  available  (busy).  We  see  from 
Figure  12  that  although  the  average  traffic  load  (indicated  by 
the  stationary  distribution)  of  the  primary  network  is  the  same 
in  all  cases,  different  traffic  statistics  of  the  primary  network 
lead  to  different  performance  of  the  secondary  network.  OSA 
is  the  most  effective  when  it  is  overlayed  over  a  primary 
network  with  large  inter-arrival  time  and  message  length. 
Furthermore,  the  throughput  of  the  secondary  user  increases 
over  time  in  Case  1  and  3,  which  results  from  the  improved 
information  on  the  network  state  drawn  from  accumulating 
observations.  This  demonstrates  the  cognitive  nature  of  the 
proposed  protocol  under  the  POMDP  formulation.  Note  that 
in  Case  2,  the  Markovian  channel  state  evolution  degenerates 
to  an  i.i.d.  sequence.  No  information  can  be  gained  from  past 
observations  and  the  optimal  strategy  reduces  to  a  random 
selection  of  channels. 

In  Figure  13  we  study  the  performance  of  the  suboptimal 
greedy  approach  as  compared  with  the  optimal  protocol.  For 
the  channels  used  in  the  upper  plot,  the  performance  of  the 
greedy  approach  matches  that  of  the  optimal  protocol.  For  the 
channels  in  the  lower  plot,  the  performance  loss  of  the  greedy 
approach  is  within  3%.  In  both  cases,  significant  gain  over 
a  random  selection  of  channels  is  achieved  by  the  proposed 
cognitive  protocols. 

Spectrum  Efficiency  in  the  Presence  of  Sensing  Error  We 
now  study  the  performance  of  the  proposed  OSA  MAC 
protocols  based  on  a  specific  spectrum  sensor  at  the  physical 
layer.  At  the  beginning  of  each  slot,  the  sensor  takes  L 
measurements  {  K:  }(Ti  of  the  chosen  channel.  We  assume  that 
both  the  channel  noise  and  the  signal  of  primary  users  can 
be  modelled  as  white  Gaussian  processes.  We  then  have  the 
equation  on  the  top  of  the  following  page.  The  SNR  is  given 
by  (rf?  —  rfo )/°o-  K  can  be  shown  that  the  Neyman-Pearson 
detector  for  this  problem  is  given  by  the  energy  detector: 


The  probability  e  of  false  alarm  (opportunity  overlook)  is 
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J  7io  (when  channel  is  idle)  :  F,  ~  7V"(0,  Oq),  i  =  1,  ■  ■  ■  ,  L 
\  Tii  (when  channel  is  busy)  :  Yi  ~  7V"(0,  erf),  i  =!,•■•  ,L 


Fig.  13.  Performance  comparison  of  the  greedy  approach  and  the  optimal 
strategy  (in  the  upper  plot,  we  have  N  =  3  independent  channels  with  the 
same  bandwidth  B  —  1  and  transition  probabilities  {a  =  0.2,  (3  =  0.8}; 
in  the  lower  plot,  N  =  3,  a  =  [0.8,  0.6,  0.4],  0  =  [0.6,  0.4,  0.2],  B  = 
[3/4, 1,3/2]). 

determined  by  the  threshold  t: 

e^Pr{||Y||2>T|iJ0}  =  l-r^|,^,  (18) 

where  T(L,x)  =  f*  tL~1e~t  dt  is  the  incomplete  gamma 
function.  The  ROC  for  the  Neyman-Pearson  testing  is  thus 
given  by 

1  -  *  =  Pr{||Y||2  >  t\Hx}  =  1  -  r  -  <19) 

where  r)  satisfies  r  (^ ,  77)  =  1  —  e.  The  ROC  curves  for 
different  SNRs  and  numbers  L  of  samples  are  shown  in 
Figure  8. 

In  Figure  14  we  study  the  performance  of  the  proposed 
greedy  approach  (using  the  above  specified  spectrum  sensors) 
as  a  function  of  the  maximum  collision  probability  (  allowed 
by  the  primary  network.  In  the  upper  plot,  we  focus  on  the 
secondary  user.  We  can  see  that  as  (  increases,  the  throughput 
of  the  secondary  user  approaches  the  performance  achieved  by 
the  optimal  protocol  in  the  absence  of  sensing  errors.  This  is 
because  with  a  large  (,  the  probability  e  of  overlook  can  be 
very  small,  leading  to  improved  throughput  for  the  secondary 
user  at  a  price  of  more  collisions  with  the  primary  network. 
As  shown  in  the  lower  plot,  due  to  frequent  collisions,  the 
overall  spectrum  efficiency  considering  both  the  primary  and 
secondary  users  decreases  when  /  is  large.  The  best  spectrum 
efficiency  is  obtained  when  (  is  around  0.15. 

Multiple  Secondary  Users  with  Random  Message  Arrivals  In 
this  example,  we  study  the  performance  of  the  greedy  ap¬ 
proach  when  there  are  multiple  secondary  users.  We  assume 
that  sensing  errors  are  negligible.  Message  arrivals  at  the 


Fig.  14.  OS  A  performance  in  the  presence  of  sensing  error  ( N  =  3 
independent  channels  with  the  same  bandwidth  B  =  1  and  transition 
probabilities  {a  =  0.4,  (3  =  0.5}). 

secondary  users  form  a  Poisson  process  with  rate  A.  The 
message  length  is  geometrically  distributed  with  an  average 
message  length  of  50  packets  (the  transmission  time  of  one 
packet  is  assumed  to  be  one  slot).  If  a  message  arrives,  the 
whole  message  will  be  randomly  assigned  to  a  secondary  user. 
In  each  slot,  those  secondary  users  who  do  not  have  packets 
to  transmit  will  turn  to  sleep  mode:  they  do  not  participate 
in  channel  selection  and  sensing,  and  their  belief  states  are 
updated  according  to  the  Markovian  model  of  spectrum  oc¬ 
cupancy.  Secondary  users  with  packets  to  send  will  choose 
channels  according  to  the  greedy  approach10,  and  then  update 
their  belief  states  according  to  the  sensing  outcomes.  If  an 
available  channel  is  chosen  by  multiple  users,  we  assume  that 
one  of  these  users  will  succeed.  Shown  in  Figure  15  is  the 
throughput  measured  in  bits/slot  of  the  secondary  users  as  a 
function  of  the  message  arrival  rate  A.  We  can  see  that  the 
throughput  of  the  secondary  users  increases  with  A. 

VII.  Conclusion 

We  have  presented  in  this  paper  an  approach  to  decen¬ 
tralized  MAC  for  ad  hoc  OSA  networks.  A  novel  feature 
of  this  work  is  the  exploitation  of  opportunities  at  the  slot 
level,  allowing  low  rate  applications  (such  as  sensor  nodes) 
coexists  with  primary  users.  The  framework  of  POMDP  makes 
the  MAC  cognitive;  an  opportunistic  user  makes  optimal 
decisions  for  sensing  and  access  based  on  the  belief  vector 
that  summarizes  the  knowledge  of  the  network  state  based 
on  all  past  decisions  and  observations.  Our  formulation  also 

10Note  that  if  the  maximum  immediate  rewards  of  several  channels  are 
the  same,  the  secondary  user  will  randomly  choose  one  of  these  channels  to 
sense. 
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Fig.  15.  Multiple  secondary  users  with  random  message  arrivals  (N  = 
10  independent  channels  with  the  same  bandwidth  B  =  1  and  transition 
probabilities  {a;  =  0.2,  f3  =  0.8};  three  secondary  users). 

allows  the  integration  of  sensing  errors  and  other  practical 
impairments  into  the  POMDP  modeling. 

Appendix:  Proof  of  Proposition  1 

We  show  that  when  N  channels  evolve  independently,  A (t)  can  be 
obtained  from  f }(t).  Without  loss  of  generality,  we  consider  N  =  2. 
Let  T(t)  denote  the  information  obtained  up  to  the  beginning  of  slot 
t.  Let  Tn  denote  the  most  recent  time  instant  when  channel  n  is 
chosen.  We  can  thus  write  an  entry  of  A(t)  as  in  (20).  Quantities 
in  (20)  are  entries  of  Q(£).  Hence,  $2  is  a  sufficient  statistics  when 
channels  evolve  independently. 
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